local dependency
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Education (0.46)
- Leisure & Entertainment (0.46)
- Information Technology (0.46)
- North America > United States > Texas > Travis County > Austin (0.05)
- Europe > United Kingdom > England > Kent > Canterbury (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (5 more...)
Sm: enhanced localization in Multiple Instance Learning for medical imaging classification
Multiple Instance Learning (MIL) is widely used in medical imaging classification to reduce the labeling effort. While only bag labels are available for training, one typically seeks predictions at both bag and instance levels (classification and localization tasks, respectively). Early MIL methods treated the instances in a bag independently. Recent methods account for global and local dependencies among instances. Although they have yielded excellent results in classification, their performance in terms of localization is comparatively limited.
ELDEN: Exploration via Local Dependencies
Tasks with large state space and sparse rewards present a longstanding challenge to reinforcement learning. In these tasks, an agent needs to explore the state space efficiently until it finds a reward. To deal with this problem, the community has proposed to augment the reward function with intrinsic reward, a bonus signal that encourages the agent to visit interesting states. In this work, we propose a new way of defining interesting states for environments with factored state spaces and complex chained dependencies, where an agent's actions may change the value of one entity that, in order, may affect the value of another entity. Our insight is that, in these environments, interesting states for exploration are states where the agent is uncertain whether (as opposed to how) entities such as the agent or objects have some influence on each other.
ConvBERT: Improving BERT with Span-based Dynamic Convolution
Pre-trained language models like BERT and its variants have recently achieved impressive performance in various natural language understanding tasks. However, BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost. Although all its attention heads query on the whole input sequence for generating the attention map from a global perspective, we observe some heads only need to learn local dependencies, which means existence of computation redundancy. We therefore propose a novel span-based dynamic convolution to replace these self-attention heads to directly model local dependencies. The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning. We equip BERT with this mixed attention design and build a ConvBERT model. Experiments have shown that ConvBERT significantly outperforms BERT and its variants in various downstream tasks, with lower training cost and fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, using less than 1/4 training cost. Code and pre-trained models will be released.
Predicting the Formation of Induction Heads
Aoyama, Tatsuya, Wilcox, Ethan Gotlieb, Schneider, Nathan
Arguably, specialized attention heads dubbed induction heads (IHs) underlie the remarkable in-context learning (ICL) capabilities of modern language models (LMs); yet, a precise characterization of their formation remains unclear. In this study, we investigate the relationship between statistical properties of training data (for both natural and synthetic data) and IH formation. We show that (1) a simple equation combining batch size and context size predicts the point at which IHs form; (2) surface bigram repetition frequency and reliability strongly affect the formation of IHs, and we find a precise Pareto frontier in terms of these two values; and (3) local dependency with high bigram repetition frequency and reliability is sufficient for IH formation, but when the frequency and reliability are low, categoriality and the shape of the marginal distribution matter.
- Europe > Austria > Vienna (0.14)
- South America > Paraguay > Asunción > Asunción (0.04)
- Europe > Norway > Norwegian Sea (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Education (0.46)
- Leisure & Entertainment (0.46)
- Information Technology (0.46)
- North America > United States > Texas > Travis County > Austin (0.05)
- Europe > United Kingdom > England > Kent > Canterbury (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (5 more...)